If you want to use the procedure outside of the github folder (recommended), you should replicate the same folder tree. In figure 1 is provided an example of the structure that should be replicated. Please ensure that all the folders displayed in figure 1 exists.
Data collected onboard are stored in access files, located in the “OnBoard/assess” subfolder. Ideally, a maximum of 20 hauls should be stored in a .accdb file.
To start a new file, copy the template version “Maschera inserimento SOLEMON_template.accdb”, paste it in the same folder and rename it as “Maschera inserimento SOLEMON_YEAR_N.accdb”, where YEAR is the reporting year and N is a progressive number. Example: “Maschera inserimento SOLEMON_2024_1.accdb” . It does not matter if parts of the same haul are in different files, this is handled in post-processing.
Each access file contains a template sheet, called “cala_template”. To start a new haul, copy the template sheet, paste it and rename it as “cala_x”, where x is the haul number. Example: cala_1; cala_7bis. It does not matter if parts of the same haul are in different files, this is handled in post-processing.
The structure of the sheets, in 2024, is according to Figure1. In
detail:
Auto compilation applies to some column of the access file. This mean that in the processing of the table, to empty cells is assigned the first available data that is found in the previous rows. When collecting data, for these column you need to specify the value just for the first observation, then you should fill the value again only when this change. These columns are:
Standard procedure applies when all the individuals for a given species are collected and reported
When reporting data for target species you should use only the first 7 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species). The column id_specimen serves to store any kind of individual id (e.g.: otholits code, genetic samples).
When reporting data for elasmobranchs you should use only the first 9 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species). The column id_specimen serves to store any kind of individual id (e.g.: otholits code, genetic samples). Refer to the Figure 4 for reporting the 3 length measures
When reporting data for other commercial species you should use only the first 4 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species).
For this species category are recorded individual length and cumulative weight. The cumulative weight should be reported in the last record, as in Figure 5.
When reporting data for shellfishes (MUREBRA, HEXATRU, OSTREDU) you should use the columns ax reported in Figure 6. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species).
NB: in case of subsamples, refer to the dedicated section
Subsamples are taken when the amount of individuals in a given species is too high to be processed. Considering the onboard practice used in the solemon survey there are three cases of subsamples happening. Figure 7 reports the steps that are done according to the three cases. Each case is treated differently depending on the type of species, and the data treatment is explained in the dedicated sections.
In terms of onboard procedure, the cases general refers to:
A typical case for subsample of target species is AEQUOPE, which usually occurs in large quantities and LFD is needed. In this case only a few individuals are measured to obtain the length structure, then it is needed to estimate total number (and sometimes total weight). Individuals that are processed for length and weight can be treated as any other target species. Regarding the other information needed to raise the values, you need to create a new record for the species (and gear) to store subsample data. There are two expected cases:
ALL individuals collected from the haul (sorted sample), then a subsample is taken (sorted subsample) for individual processing. One record is dedicated to the subsample details. The total weight of the sorted sample (in kilograms) is reported in the in the kg_field1 field. The weight of the sorted subsample is reported in the kg_field2. type_subsample is “species”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.
A subsample is taken from the haul (unsorted subsample), then ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in kg_field1. The weigth of the unsorted subsample is reported in kg_field2. The weight of the sorted subsample is reported in kg_field3. type_subsample is “haul”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.
A typical case for subsample of non-target species is MUREBRA. When it occurrs in large aggregations, total number (and sometimes total weight) are estimated from subsamples. To store information needed to raise the values, there are two expected cases:
ALL individuals collected from the haul (sorted sample), then a subsample is taken (sorted subsample) for estimating total number. One record is dedicated to the subsample details. The total weight of the sorted sample (in kilograms) is reported in the in the kg_field1 field. The weight of the sorted subsample is reported in the kg_field2. The number of individual in the sorted subsample is reported in the number_field1. type_subsample is “species”.
A subsample is taken from the haul (unsorted subsample), then ALL the individuals in the subsample are collected (sorted subsample) for estimating total number. One record is dedicated to the subsample details. The weight of the haul is reported in kg_field1. The weigth of the unsorted subsample is reported in kg_field2. The weight of the sorted subsample is reported in kg_field3. The number of individual in the sorted subsample is reported in the number_field1. type_subsample is “haul”.
it has been only used for MUREBRA and HEXATRU in cases of exceptional catches. ALL individuals of these two species are collected from the haul (partially sorted sample), then a subsample is taken from the partially sorted sample (partially sorted subsample). The partially sorted subsample is sorted, aka it is divede into species (sorted subsample). The sorted subsample of each species is subsampled againg (sorted sub subsample) for estimating total number. One record is dedicated to the subsample details. The weight of the partially sorted sample is reported in kg_field1. The weigth of the sorted subsample for each species is reported in kg_field2. The weight of the sorted sub-subsample is reported in kg_field3. The number of individual in the sorted sub-subsample is reported in the number_field1. type_subsample is “multi”.
Excel files retrieved from SeaStar should be stored in the folder ‘data/minilog’ and called as ‘minilogcode_shortdate’ (e.g.: K9478_1012’).
Data stored in the .accdb file are retrieved and handled by R
scripts, located in the “R” folder. The required script is
workflow_access_v0.
To process single hauls, run the lines up to 15 and then please set the following parameters:
# set parameters
haul=22
db='test'
updateID='N'
area_sepia='D'
year=2022
area='ITA17'
When parameters are set, the data processing is pre-defined: you just
have to run it without changing the parameters. The first step is the
function1, which scope is to format the access table
according to output standards. There is no need to see the output of
this function. Just run it.
# function1 extract data from access db and format them
hauldata=function1(haul=haul,
db=db,
year=year)# extract and format data
When the function1 is done, you can proceed with the
function2, which scope is to perform some checks. Checks
done are plots that would be saved under the path ‘output/checks’
# function 2: perform checks
function2(xdat=hauldata,
haul=haul)
function3 scope is to format data according to trust
format. Excel sheets are saved under the path ‘output/trust’.
# function 3: format data to trust format
trustdat=function3(xdat=hauldata[[1]],
haul=haul,
year = year,
weight_not_target = hauldata[[2]],
subsamples_target=hauldata[[3]],
catch_sample_disattivati = catch_sample_disattivati) # function 2
function4 creates a pdf report and save it under the
path ‘output/pdf’.
# function4: save PDF
function4(trustdat = trustdat,
year=year,
area = area,
haul=haul)
To process more than one haul, you should care to properly fill in the ‘haul_order’ excel sheet (see input data section). After having loaded the haul summary, just run the loop represented below.
haul_summary=read_excel("data/haul_order.xlsx")
haul_summary=haul_summary[1:5,]
for(xhaul in 1:nrow(haul_summary)){
# loop parameters
haul=haul_summary[xhaul,]$haul
db=haul_summary[xhaul,]$DB
area=haul_summary[xhaul,]$country
cat('processing haul no.', haul, '(', xhaul,'/', nrow(haul_summary),')' )
# function1 extract data from access db and format them
hauldata=function1(haul=haul,
db=db,
year=year)# extract and format data
# function 2: perform checks
function2(xdat=hauldata,
haul=haul)
# function 3: format data to trust format
trustdat=function3(xdat=hauldata[[1]],
haul=haul,
year = year,
weight_not_target = hauldata[[2]],
subsamples_target=hauldata[[3]],
catch_sample_disattivati = catch_sample_disattivati) # function 2
# function4: save PDF
function4(trustdat = trustdat,
year=year,
area=area,
haul=haul)
}
TBD
This section explains the content of each file located in the data folder, and gives brief explanation of the file purpose.
This file controls formatting of trust templates. It indicates which samples (Station, Gear and SpecCode) should be indicated as “InUse” = FALSE in the catch sample files used as input data in trust.
| Station | Gear | SpecCode |
|---|---|---|
| 11 | D | AEQUOPE |
Store the updated serial number used to identify specimens
for which detailed samples (otolit, genetic etc.) were taken. This
number should refere to the last ID assigned to a specimen. The
use of this file is controlled by the updateID parameter in
the workflow_access_v0 file: if updateID is set equal to Y,
the fishID file is used to assign IDs (when requested) and it is then
updated. The columns refers to:
| type | code | fishID | haul | species |
|---|---|---|---|---|
| ELAS | Elas | 202 | NA | RAJAAST:RAJACLA:RAJAMIR:TORPMAR |
| solea | SS | 7061 | NA | SOLEVUL:SOLEAEG |
| RHO | SR | 447 | NA | SCOHRHO |
Store the information associated with hauls. This file is used (1) when the data workflow is applied in loop; (2) by the minilog script. The columns refers to:
| day | haul | id | note | inizio | fine | valid | DB | country | peso_rapido_A | peso_rapido_D | peso_subcampione_a | peso_subcampione_D |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022-11-24 | 31 | 1 | NA | NA | NA | 1 | 2022_1 | ITA17 | NA | NA | NA | NA |
| 2022-11-24 | 66 | 2 | NA | NA | NA | 1 | 2022_1 | ITA17 | NA | NA | NA | NA |
| 2022-11-24 | 29 | 3 | NA | NA | NA | 1 | 2022_1 | ITA17 | NA | NA | NA | NA |
Store the length-weigth parameters of target species. This file is used to reconstruct length (or weight) when it is not available in the recorded data. Example: shrimps where missi a part of the tail but have the head intact are suitable only for length measurement; fishes that were spoiled by the gear may be ok for weight but not measurables for length. The columns refers to:
| species_name | sex | a | b | source |
|---|---|---|---|---|
| SOLEVUL | NA | 0.00460 | 3.11 | benchmark_assessment |
| MULLBAR | NA | 0.00871 | 3.09 | fishbase |
| RAJACLA | NA | 0.00269 | 3.23 | fishbase |
Store the code of the maturity scales for target species. This file is used to format input files for trust.
| SPECIES | SEX | SCALE |
|---|---|---|
| MELIKER | F | MEDPF |
| MERLMER | F | MEDFI |
| MERLMER | M | MEDFI |
This is just the TB file updated to 2021, as stored in trust. It serves to perform some checks and it should not be modified. Preview not shown.
This file is downloaded from the trust database and not modified. It contains the species list. Last update xxx. If need to modify, please do it in thrust and then download the excel again!.
| Species | Medits | Sp_Subcat | Len_Class | Notes |
|---|---|---|---|---|
| Aaptos aaptos | AAPTAAP | E | NA | NA |
| Abra prismatica | ABRAPRI | E | NA | NA |
| Abra sp | ABRASPP | E | NA | NA |
Contains the species that are target for the survey (individual length and weight) and the molluscs for which only total weight and total number are needed.
| species_name | target |
|---|---|
| SOLEVUL | 1 |
| SOLEAEG | 1 |
| PLATFLE | 1 |